36 research outputs found
Multi-task Pairwise Neural Ranking for Hashtag Segmentation
Hashtags are often employed on social media and beyond to add metadata to a
textual utterance with the goal of increasing discoverability, aiding search,
or providing additional semantics. However, the semantic content of hashtags is
not straightforward to infer as these represent ad-hoc conventions which
frequently include multiple words joined together and can include abbreviations
and unorthodox spellings. We build a dataset of 12,594 hashtags split into
individual segments and propose a set of approaches for hashtag segmentation by
framing it as a pairwise ranking problem between candidate segmentations. Our
novel neural approaches demonstrate 24.6% error reduction in hashtag
segmentation accuracy compared to the current state-of-the-art method. Finally,
we demonstrate that a deeper understanding of hashtag semantics obtained
through segmentation is useful for downstream applications such as sentiment
analysis, for which we achieved a 2.6% increase in average recall on the
SemEval 2017 sentiment analysis dataset.Comment: 12 pages, ACL 201
Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary tasks
Effectively leveraging multimodal information from social media posts is
essential to various downstream tasks such as sentiment analysis, sarcasm
detection and hate speech classification. However, combining text and image
information is challenging because of the idiosyncratic cross-modal semantics
with hidden or complementary information present in matching image-text pairs.
In this work, we aim to directly model this by proposing the use of two
auxiliary losses jointly with the main task when fine-tuning any pre-trained
multimodal model. Image-Text Contrastive (ITC) brings image-text
representations of a post closer together and separates them from different
posts, capturing underlying dependencies. Image-Text Matching (ITM) facilitates
the understanding of semantic correspondence between images and text by
penalizing unrelated pairs. We combine these objectives with five multimodal
models, demonstrating consistent improvements across four popular social media
datasets. Furthermore, through detailed analysis, we shed light on the specific
scenarios and cases where each auxiliary task proves to be most effective
Analysing domain suitability of a sentiment lexicon by identifying distributionally bipolar words
Abstract Contemporary sentiment analysis approaches rely heavily on lexicon based methods. This is mainly due to their simplicity, although the best empirical results can be achieved by more complex techniques. We introduce a method to assess suitability of generic sentiment lexicons for a given domain, namely to identify frequent bigrams where a polar word switches polarity. Our bigrams are scored using Lexicographers Mutual Information and leveraging large automatically obtained corpora. Our score matches human perception of polarity and demonstrates improvements in classification results using our enhanced contextaware method. Our method enhances the assessment of lexicon based sentiment detection algorithms and can be further used to quantify ambiguous words
Sentiment analysis with genetically evolved Gaussian kernels
Sentiment analysis consists of evaluating opinions or statements based on text analysis. Among the methods used to estimate the degree to which a text expresses a certain sentiment are those based on Gaussian Processes. However, traditional Gaussian Processes methods use a prede- fined kernels with hyperparameters that can be tuned but whose structure can not be adapted. In this paper, we propose the application of Genetic Programming for the evolution of Gaussian Process kernels that are more precise for sentiment analysis. We use use a very flexible representation of kernels combined with a multi-objective approach that considers si- multaneously two quality metrics and the computational time required to evaluate those kernels. Our results show that the algorithm can outper- form Gaussian Processes with traditional kernels for some of the sentiment analysis tasks considered
Studying user income through language, behaviour and affect in social media
Automatically inferring user demographics from social media posts is useful for both social science research and a range of downstream applications in marketing and politics. We present the first extensive study where user behaviour on Twitter is used to build a predictive model of income. We apply non-linear methods for regression, i.e. Gaussian Processes, achieving strong correlation between predicted and actual user income. This allows us to shed light on the factors that characterise income on Twitter and analyse their interplay with user emotions and sentiment, perceived psycho-demographics and language use expressed through the topics of their posts. Our analysis uncovers correlations between different feature categories and income, some of which reflect common belief e.g. higher perceived education and intelligence indicates higher earnings, known differences e.g. gender and age differences, however, others show novel findings e.g. higher income users express more fear and anger, whereas lower income users express more of the time emotion and opinions
Analysing domain suitability of a sentiment lexicon by identifying distributionally bipolar words
Abstract Contemporary sentiment analysis approaches rely heavily on lexicon based methods. This is mainly due to their simplicity, although the best empirical results can be achieved by more complex techniques. We introduce a method to assess suitability of generic sentiment lexicons for a given domain, namely to identify frequent bigrams where a polar word switches polarity. Our bigrams are scored using Lexicographers Mutual Information and leveraging large automatically obtained corpora. Our score matches human perception of polarity and demonstrates improvements in classification results using our enhanced contextaware method. Our method enhances the assessment of lexicon based sentiment detection algorithms and can be further used to quantify ambiguous words
Combining Humor and Sarcasm for Improving Political Parody Detection
Parody is a figurative device used for mimicking entities for comedic or
critical purposes. Parody is intentionally humorous and often involves sarcasm.
This paper explores jointly modelling these figurative tropes with the goal of
improving performance of political parody detection in tweets. To this end, we
present a multi-encoder model that combines three parallel encoders to enrich
parody-specific representations with humor and sarcasm information. Experiments
on a publicly available data set of political parody tweets demonstrate that
our approach outperforms previous state-of-the-art methods.Comment: Accepted at NAACL 202